Dataset Description
A total of 7880 individuals from 2611 families were genotyped on the Illumina Human1M-Duov3_B or the Human1Mv1_C.
- 4901 males, 2979 females.
- 2571 trios, 36 quads, 1 pentas, 3 hexs.
- 947,233 SNPs were genotyped.
- Coordinates were based on Build36.
Raw Genotype QC
Sex Check
- 141 PRROBLEM
- 115 with complete missing chrX genotypes.
- 26 with chrX-F ranging from 0.20 to 0.62
ChrX F distributions

Pariwise IBD estimation
- Relationships (RT): OT (Others), FS (Full Siblings), PO (Parent Offspring)
- family ID 483 & 1012 has potential issue
- FID:483 with IBD sharing = 1 between IID:328 (Female) and IID:1491 (Female), which were supposed to be ~0. Are they MZ? same individual? The genotype missing rates are 0.1185 and 0.1186 for IID:483_328 and IID:483_1491, respectively. [Drop IID:483_1491, IID:483_4371 and IID:483_993.]
- FID:1012 with IBD sharing = 0.57 between IID:2319 (Female) and IID: 3612 (Female). They are supposed to be FS but recruited into the same FID. Other kinship between FID:1012 can be confirmed.
- IBS sharing for other pairs: ranging from 0.44 to 0.58 in FS, from 0.50 to 0.59 in PO, from 0 to 0.12 in OT (which indicating inbreeding between some parents.)
Estimated pairwise IBD distributions

Individual genome-wide heterozygosity
Genome-wide heterozygosity VS missing rates

Note that samples were genotyped in the Human1M-Duov3_B or the Human1Mv1_C. Genotypes for these individuals are an union of the genotypes from both platforms. For missingness, only the intersecting SNPs between two arrays were used.
Genome-wide F VS missing rates

Imputation
Pre-imputation
The imputation pipeline follows that used for SSC dataset. A total of 7769 individuals and ~784K autosomal, ~22K chrX SNPs were used for further impution.
- filters: --geno 0.05 --mind 0.1 --maf 0.01 --hwe 1e-6
- 111 people removed due to missing genotype data (–mind). Their missingness rates ranging from 0.7 to 1.
- Total genotyping rate in remaining samples is 0.914029.
- 124565 variants removed due to missing genotype data (–geno).
- 15633 variants removed due to Hardy-Weinberg exact test.
After Imputation
Frequency distribution
- ~7.6M SNPs overlapped SNPs between AGP_imputed and HRC_WGS (passing filters: --geno 0.05 --maf 0.01 --hwe 1e-6)
- based on same allele
- 0 SNPs with MAF difference > 0.2

PCA
- Project the first 3 PCs based on pruned HapMap3 SNPs onto 1000G
- Using K-means to calculate distance
- Assign ancestry based on posterior probability 0.9
- 6548 Europeans (EUR), 625 Americans (AMR), 123 South-Asians (SAS), 99 East-Asians (EAS) and 141 Africans (AFR).
